Neural Networks and Deep Learning

Hui Lin @Netlify

Ming Li @Amazon

2019-06-09

Types of Neural Network

Neural Network Application

Input (x) Output (y) Application
Home features Price Real Estate
Ad, user info Click on an Ad? (0/1) Online Advertising
Image Object (1, …, 10) Photo tagging
Image, Radar info Position of other cars Autonomous driving
Audio Text transcript Speech recognition
English Chinese Machine translation
Voice Voice Human computer conversation

Logistic Regression as A Neural Network

\[X=\left[\begin{array}{cccc} x_{1}^{(1)} & x_{1}^{(2)} & \dotsb & x_{1}^{(m)}\\ x_{2}^{(1)} & x_{2}^{(2)} & \dotsb & x_{2}^{(m)}\\ \vdots & \vdots & \vdots & \vdots\\ x_{n_{x}}^{(1)} & x_{n_{x}}^{(2)} & \dots & x_{n_{x}}^{(m)} \end{array}\right]\in\mathbb{R}^{n_{x}\times m}\]

\[y=[y^{(1)},y^{(2)},\dots,y^{(m)}] \in \mathbb{R}^{1 \times m}\]

\(\hat{y}^{(i)} = \sigma(w^Tx^{(i)} + b)\) where \(\sigma(z) = \frac{1}{1+e^{-z}}\)

Logistic Regression as A Neural Network

\[X=\left[\begin{array}{cccc} x_{1}^{(1)} & x_{1}^{(2)} & \dotsb & x_{1}^{(m)}\\ x_{2}^{(1)} & x_{2}^{(2)} & \dotsb & x_{2}^{(m)}\\ \vdots & \vdots & \vdots & \vdots\\ x_{n_{x}}^{(1)} & x_{n_{x}}^{(2)} & \dots & x_{n_{x}}^{(m)} \end{array}\right]\in\mathbb{R}^{n_{x}\times m}\]

\[y=[y^{(1)},y^{(2)},\dots,y^{(m)}] \in \mathbb{R}^{1 \times m}\]

\(\hat{y}^{(i)} = \sigma(w^Tx^{(i)} + b)\) where \(\sigma(z) = \frac{1}{1+e^{-z}}\)

Gradient Descent

Neural Network: 0 Layer Neural Network

Neural Network: 1 Layer Neural Network

Deep Neural Network

Neural Network: 1 Layer Neural Network

Neural Network: 1 Layer Neural Network

Neural Network: 1 Layer Neural Network

Across m Samples

Across m Samples

Activation Functions

Activation Functions

  1. fast computation;
  2. non-linear;
  3. reduced likelihood of the gradient to vanish;
  4. Unconstrained response
    • Sigmoid, studied in the past, not as good as Relu in deep learning, due to the gradient vanishing problem when there are many layers
    • hyperbolic tangent function (tanh)

Deal with Overfitting: Regularization

For logistic regression,

\[\underset{w,b}{min}J(w,b)= \frac{1}{m} \Sigma_{i=1}^{m}L(\hat{y}^{(i)}, y^{(i)}) + penalty\]

where

\[L_2\ penalty=\frac{\lambda}{2m}\parallel w \parallel_2^2 = \frac{\lambda}{2m}\Sigma_{i=1}^{n_x}w_i^2\] \[L_1\ penalty = \frac{\lambda}{m}\Sigma_{i=1}^{n_x}|w|\] For neural network,

\[J(w^{[1]},b^{[1]},\dots,w^{[L]},b^{[L]})=\frac{1}{m}\Sigma_{i=1}^{m}L(\hat{y}^{(i)},y^{(i)}) + \frac{\lambda}{2m}\Sigma_{l=1}^{L} \parallel w^{[l]} \parallel^2_F\] where \(\parallel w^{[l]} \parallel^2_F = \Sigma_{i=1}^{l}\Sigma_{j=1}^{l-1} (w^{[l]}_{ij})^2\)

Deal with Overfitting: Dropout

Deal with Overfitting

Is it overfitting?

Batch, Mini-batch, Stochastic Gradient Descent

\(\begin{array}{ccc} x= & [\underbrace{x^{(1)},x^{(2)},\cdots,x^{(1000)}}/ & \cdots/\cdots x^{(m)}]\\ (n_{x},m) & mini-batch\ 1 \end{array}\)

\(\begin{array}{ccc} y= & [\underbrace{y^{(1)},y^{(2)},\cdots,y^{(1000)}}/ & \cdots/\cdots y^{(m)}]\\ (1,m) & mini-batch\ 1 \end{array}\)

Recap of A Few Key Concepts

MNIST Dataset

Use Keras R Package

  1. Data preprocessing (from image to list of input features)
    • One image of 28x28 grey scale value matrix \(\rightarrow\) 784 column of features
    • Scale the value to between 0 and 1, by divide each value by 255
    • Make response categorical (i.e. 10 columns with the corresponding digit column 1 and rest columns zero.
  2. Load keras package and build a neural network with a few layers
    • Define a placeholder object for the NN structure
    • 1st layer using 256 nodes, fully connected, using ‘relu’ activation function and connect from the input 784 features
    • 2nd layer using 128 nodes, fully connected, using ‘relu’ activation function
    • 3rd layer using 64 nodes, fully connected, using ‘relu’ activation function
    • 4th layer using 10 nodes, fully connected, using ‘softmax’ activation function and connect to the output 10 columns
    • Add drop out to the first three layers to prevent overfitting
  3. Compile the NN model, define loss function, optimizer, and metrics to follow

  4. Fit the NN model using the training dataset, define epoch, mini batch size, and validation size used in the training where the metrics will be checked

  5. Predict using the fitted NN model using the testing dataset

R Scripts

Deep Learning Models Across Platforms